Connects the Variables

As described earlier, correlation assesses the relationship between two continuous numeric variables

(as compared to categorical variables, as described in Chapter 8). This relationship can also be

evaluated with regression analysis to provide more information about how these two variables are

related. But perhaps more importantly, regression is not limited to continuous variables, nor is it

limited to only two variables. Regression is about developing a formula that explains how all the

variables in the regression are related. In the following sections, we explain the purpose of regression

analysis, identify some terms and notation typically used, and describe common types of regression.

Understanding the purpose of regression analysis

You may wonder how fitting a formula to a set of data can be useful. There are actually many uses.

With regression, you can

Test for a significant association or relationship between two or more variables. The process

is similar to correlation, but is more generalized to produce a unique equation or formula relating

to the variables.

Get a compact representation of your data. A well-fitting regression model succinctly

summarizes the relationships between the variables in your data.

Make precise predictions, or prognoses. With a properly fitted survival function (see Chapter

23), you can generate a customized survival curve for a newly diagnosed cancer patient based on

that patient’s age, gender, weight, disease stage, tumor grade, and other factors to predict how long

they will live. A bit morbid, perhaps, but you could certainly do it.

Do mathematical manipulations easily and accurately on a fitted function that may be difficult

or inaccurate to do graphically on the raw data. These include making estimates within the range

of the measured values (called interpolation) as well as outside the measured values (called

extrapolation, and considered risky). You may also want to smooth the data, which is described in

Chapter 19.

Obtain numerical values for the parameters that appear in the regression model

formula.Chapter 19 explains how to make a regression model based on a theoretical rather than

known statistical distribution (described in Chapter 3). Such a model is used to develop estimates

like the ED50 of a drug, which is the dose that produces one-half the maximum effect.

Talking about terminology and mathematical notation

A regression model is a formula that describes how one variable, the dependent variable,

depends on one or more other variables, and on one or more parameters. (While it is technically

possible to have more than one dependent variable in a model, a discussion of this type of

regression is outside the scope of this book.) The dependent variable is also called the outcome,

and the other variables are called independent variables or predictors. Parameters refer to the

other terms that appear in the formula that make the function come as close as possible to the

observed data which are determined by the statistical software you are using.